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Abstract 

Current methods for determining whether a time series exhibits fractal structure (FS) rely on 
subjective assessments on estimators of the Hurst exponent (H). Here, I introduce the Bayesian 
Assessment of Scaling, an analytical framework for drawing objective and accurate inferences on the 
FS of time series. The technique exploits the scaling property of the diffusion associated to a time 
series. The resulting criterion is simple to compute and represents an accurate characterization of 
the evidence supporting different hypotheses on the scaling regime of a time series. Additionally, a 
closed-form Maximum Likelihood estimator of H is derived from the criterion, and this estimator 
outperforms the best available estimators. 
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In this study I introduce a method for deciding whether a time series exhibits fractal 
scaling (FS; [l, [2], [J) in a simple, objective, and accurate way. Determining the presence 
or absence of FS in a time series is an important question for making inferences about the 
nature of underlying system that generated it. Important conclusions concerning the nature 
of systems in a wide range of domains rest on assessing whether - and, if so, the extent to 
which - observed time series exhibit FS. Usually, this is addressed by estimating whether 
the value of the Hurst exponent (H; [3]) of the series is equal to or different from 1/2. 
Often, some additional verification is performed by looking for the consistency of mnltiple 
estimators, but the criteria for establishing FS retains a fair deal of subjectivity (see, e.g., [4j 
for a heuristic methodology for combining estimators). However, estimating the value of a 
parameter does not constitute a valid assessment of FS. Rather, in order to argue that a series 
is fractal (or, more generally, its H exponent ranges within some hypothesized interval) an 
explicit contrast between different hypotheses about H is required (see for a discussion 
distinguishing parameter estimation from hypothesis contrast). The few approaches that 
consider some hypothesis testing rely on approximative techniques with multiple adjustable 
parameters and strong assumptions on the underlying model p, 

m. 

The elements in a time series can be thought of as the fluctuations of a particle following 
a diffusion process 10j. From this perspective, for any process X(t) generating values 
with a stationary distribution fx, one can consider the diffusion process Y(t) associated to 
X(r), which is defined by dY(r)/ dr = X(r). Since what is normally available is not the 
continuous evolution of X, but rather a discretely sampled time series x = {x(l), . . . , x(N)}, 
one can instead define the corresponding random walk y on discrete time k 

k 

y(fc) = J>(i), k = l,...,N. (1) 



If the generating process X is stationary, the scaling property H[ indicates that, for large 
times, the probability distribution of the walk approaches 

P(VM = ^(£)> (2) 



where 5 is the scaling exponent associated with the time series and f^ is a limiting 



density function. The scaling exponent is closely related to the H exponent 11 



probability 



121]. In the 

cases when i x corresponds to the normal distribution - fractional Gaussian noise (fGn; {2], 3}) 
- 5 is equivalent to H. 



Let x = {x(l), . . . , x(N)} be a stationary time series of N samples, and y k the k-th order 



running sum of x, yk{i) = 1 x ti)- The running sums can be taken as samples at time 



k of an ensemble of N — 1 trajectories from the diffusion process (see Refs. [12j, ll3| and 
references therein). For large summing orders (i.e., diffusion times), the scaling condition 
should apply. The simplest case is when two a priori candidate values for the scaling 
exponent, Si and 8 2 are available. One's task is to determine which of the two scaling regimes 
better describes the complexity of the observed series x. This amounts to determining the 
Bayes factor Ai )2 = ln[p(Hi\x) /p(H 2 \x)} between two simple hypotheses H 1 = 5 = Si, and 
H 2 = 5 = 62- If Ai,2 > 0, then the data support hypothesis Hi more strongly than hypothesis 
H 2 , and the reverse holds if Ai >2 < 0. By Bayes' Theorem, Ai )2 can be decomposed into 
the sum of the a priori information a 1)2 (obtained by theoretical predictions or previous 
observations of the same process) and the evidence provided by the data 771,2 0*0 

\lA x ) = ln -7TFT + ln ~7~7TFT = a l>2 + ViA x h ( 3 ) 
Pl-n2j V{ x \ tL 2) 

Exploiting the scaling property in Eq. [21 the likelihood above can be expressed in terms of 
the k-th. running sum y k . When doing this, one should consider that the overlap between 
running sum windows introduces much redundancy in the combined evidence. This imposes 
rescaling it from its length of iV — k + 1 elements, to its maximum effective length, N/k. 
The combined evidence is therefore 

t m N N d^ 1 p(y k (i)\S = Si, k) 

Combining the scaling condition of Eq. [2] with the evidence of Eq. H] results in 

VldX{k) - (N-k + l)k ^ ln fxk(z)/^] + '-- 

+ j{S 2 -Si)lnk. (5) 

For a time series of length N, there are iV — 1 summing orders than can be investigated. 
The combined evidence across all orders corresponds to the weighted mean of the evidences 
across summing orders - note that, as all the running sums were computed from a single 
realization of the series, a plain sum would exaggerate the evidence - 
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Vl,2{ X ) 



J2(N-k + l) V i >2 (x\k). (6) 



In a more general case when a candidate value for the scaling exponent S is not available, 
but rather a range of possible values is provided, our task consists in comparing the simple 
hypothesis H = 5 = 5 and the composite hypothesis H 1 = 5 G [5i,5 2 ]. As before, 
the evidence for each hypothesis is expressed in terms of its fc-order running sum. In the 
likelihood for the composite Hypothesis Hi, 8 is a now a 'nuisance' parameter, and the 
estimation of the likelihood requires integrating it away, 

p (y\5 E [Si, 6 2 ), k) = f 2 p {y\S, k) p (5\k) d5 (7a) 

J5i 

= -^- f 5 \(y\5,k)dS (7b) 

Assuming nothing about the value of S apart from it falling in the interval [^i,^], requires 
the use of an uninformative uniform prior for it, p (5 — h\k) = l/(5 2 — 5i), changing Eq. [7a] 
into Eq. l7bl Combining the scaling condition with the log-likelihoods of the hypotheses, 
and integrating Eq. 1751 the expression for the evidence provided by summing order k > 1 
becomes 



N #> N N ^ F L [yjk,i)/h*]-F L [yjk,^] 

k 6 2 -6i (N -k+l)k f-f y(k, z)fi [y(k, i) / k 6 °] In k ' {) 

where F^(x) = f_ {l(z) d-2 is the cumulative probability function of fi [14]. 

Calculating the evidences in Eqs.15], and[8]requires an analytical expression for the limiting 
distribution f^. Strictly speaking, the scaling condition applies exactly only in the infinite 
time limit. The process of convergence to this limit is one by which the original distribution 
of the data, fx, is gradually being transformed into the limiting distribution f^. Fig. [1] 
illustrates this point. It plots the departure from normality of the diffusion associated to an 
fGn as time increases. As expected, one finds a progressive increase in the Kullback-Leibler 
Divergence (KLD; [15]) between fx and the observed distribution. Even for this simple fGn 
case, full convergence to f^ is not attained at very large summing orders, but the divergence 
from normality is rather small. 

By the scaling property, if 5 is the hypothesized scaling exponent, the expected log- 
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likelihood of the sum y k is [3] 

N 

- — 5 In k (9a) 

N N f°° 

= --5\nk + - h(z)\nh(z)dz. (9b) 
k k J -oo 

However, using i x in place of f^ to estimate the likelihood would have resulted in an estimate 
(£[y\k,N,6)]) ix = -j81nk + j J°° f L (z)lni x {z)dz, (10) 
so that the estimation error would have been 

A(k) = (1 [y\k, N, 6]), L - (l [y\k, N, S]]), x 

where K[-||-] refers to the KLD between two distributions. Eq. [TT] reflects the upper bound 
of the error in estimating the log-likelihood. In general, for two summing orders k\ < hi, 
it should hold that < k 1 A(k 1 )/N < k 2 A(k 2 )/N < K[f L ||f x ]. It is therefore clear that 
using fx instead of f^ to estimate the log-likelihood of the running sums should result in an 
underestimation of the likelihood, with the amount of underestimation bound by the KLD 
between the two distributions (the KLD between two distributions is always non- negative; 
Ijj]). In the fGn case, fx is the Gaussian distribution and the approximation is exact for 
5 = 1/2, and otherwise the underestimation increases with 5. Therefore, comparing the 
approximated log-likelihoods for 5 > 1/2 against the hypothesis that 5=1/2 will be slightly 
biased in favor of the latter null hypothesis (which is perhaps desirable). 

In the case when one has sufficient evidence for the presence of FS, it is often necessary 
to determine what the scaling exponent is. A simple way to obtain such an estimate, is to 
find the value 5 that maximizes the posterior probability of the exponent given the observed 
data. If one assumes a uniform uninformative prior on 5, this is equivalent to maximizing 
the log likelihood of the data given an exponent. Using the scaling condition provides an 
expression of the log-likelihood for each summing order (£[cc|A;, 5}) as a function of 5. The 
maxima are found where the first derivative dijdb is zero with a negative second derivative 
d 2 lj d5 2 . If 5(k) is a maximum of the likelihood then, assuming that that the log-likelihood is 
peaked around the single maximum S(k) with a shape that is roughly Gaussian, the variance 
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Figure 1. (color online) Deviation from normality, numerically estimated KLD between the Gaus- 
sian distribution (fx) and the distribution of the running sums of increasing order. For each value 
of H, the points were estimated using simulated fGn of 5 ■ 10 5 elements. 

of the estimator e(k) 2 is given by the minus reciprocal of the second derivative evaluated 
at 5(k). When fx is a zero-mean Gaussian distribution of variance a 2 , one finds that the 
log-likelihood (again approximating fx by fx) has a single maximum for all values of k > 1 
with 

In (£il fc+1 V(k, *) 2 ) - In (a 2 (N -k + 1)) 



5(k) 



and estimated error 



e{kf 



2mA; 

(N — k + l)a 2 k 2S ^ +1 



(12) 



(13) 



2iVln 2 (A;) £f = ^ +1 y(k, if ' 
This produces N — 1 estimators, one for each summing order. From these, the 5(k) with the 
smallest error e(k) is chosen as the estimator. 

In order to explore the power and accuracy of the BAS criterion and the associated H 
estimator, I generated 1,000 artificial fGn series of lengths randomly chosen between 100, 
1,000, and 10,000 elements, and real values of H = 5 uniformly sampled from the (0, 1) 
interval. Fig. [2] plots the results of using the BAS criterion to compare the theory that 
5 7^ 1/2 (or more precisely, 5 G [0, 1]) with the null hypothesis of 8 — 1/2. The figure shows 
that, when the real value of 5 was either lower than .32 or higher than .61 (these limits are 
plotted by the vertical lines), the BAS clearly detects the FS. However, for values of 5 within 
the interval [.32, .61], the BAS supports the null hypothesis more strongly. This means that 
the theory of S = 1/2 presents a better description of the data than stating that nothing is 
known about 5. Notice that this does not imply that 5 — 1/2 is the 'correct' scaling, but 
rather, that it is the best among the theories compared. Choosing a better specified theory 
to test against, would enable contrasting hypotheses with higher resolution. This shows 
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Figure 2. (color online) Performance of the BAS criterion in detecting FS for 1,000 artificially 
generated fGn series of length 100 (top), 1,000 (middle), and 10,000 (bottom). The inset focuses 
on the lower values of H. 




Figure 3. (color online) Comparison of the quality of Hurst exponent estimators for 1,000 artificially 
generated fGn series of length 100 (top), 1,000 (middle), and 10,000 (bottom). 

one of the main advantages of the BAS: It permits the comparison of theories of different 
complexities, and naturally weighs the theories by their quality (i.e., stating that nothing is 
known about 5 is a very bad theory, so any not too bad approximation should be considered 
an improvement). 

Fig. [3] compares the accuracy of the estimator proposed here with some of the most 
commonly used estimators, including Rescaled Range Analysis (R/S; , Detrended 

Fluctuation Analysis (DFA; [h]]), the local Whittle estimator [17], and the Scaled Windowed 
Variance and Spectral Power Density (SWV and SPD; in their respective improved variants 
SSC and low SPD„, je as recommended by Ref. 0]). Already for series of mild length, the 
BAS estimator shows a substantially more accurate estimation than all other methods - 
both in terms of error and consistency - across all values H < .9. Only for H > .9 did the 
performace of the BAS estimator decrease, and even then it was not worse than the other 
estimators. 
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Figure 4. (color online) Robustness of the Hurst estimation (left) and BAS criterion (right) for 
1,000 artificial fGn series (H = .75) of length 1,000 with different proportions of contamination by 
an ARMA(2,1) process. 

The next question that arises is that real world time series are hardly ever 'clean' examples 
of FS, rather, the FS structure is usually corrupted by noise of different types. Of especial 
importance are short-term correlations, that some time can give the false impression of 
FS, or can mask it away. To investigate the robustness of the BAS to this type of signal 
contamination, I generated 1,000 fGn time series of length 1,000 with H = .75 that were 
contaminated by random samples from a simulated ARMA(2, 1) process (Autoregressive 



Moving Average; 181]) . The proportion of signal amplitude driven by ARMA was randomly 
sampled from (0, 1). Fig. 0] plots the resulting evidence (comparing 5 G [0, 1] with 5 = 1/2) 
and the error in estimating 5 for each series. The BAS robustly detects the presence of 
FS up to the point when the ARMA component accounts for more than half of the signal 
power. The contamination drives down 6, and when this estimator goes below the .61 limit 
mentioned above, a plain short-term correlation provides a better account of the data. 

I have introduced an accurate method for deciding whether it is justified to claim that 
an observed series indicates FS. The BAS is fully analytical, does not have parameters that 
need to be subjectively adjusted, and its results can be computed with at most one level 
of very constrained approximation (the Gaussian proposal). The main component of the 
BAS, summarized by Eq. [8], is valid for stationary series of any type, whether they have finite 
variance, or they produce anomalous scaling of the Levy type. Additionally, if the values are 
roughly normally distributed (or can be normalized) one can use the normal approximation 
and associated H estimator. Combined, the estimator and the BAS provide a powerful tool, 
enabling the use of the estimator for generating hypotheses on one portion of the data, and 
the BAS criterion to test them on a different portion. If two theories make different scaling 
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predictions, the BAS can be applied to decide which theory the data support. The BAS is 
only valid for stationary 'noises'. If the series of interest is of the fractional Brownian motion 
type, differentiation is required before the methods can be applied. For more complex non- 
stationarity, preprocessing to isolate the stationary components of the series is required (see 
[ljj, for a recent proposal). Finally, if it is not clear that the scaling should apply at the 



earliest times 



131 ]. a later reference time ko > 2 can be chosen, and the time used in the 



equations expressed relative to it k r = k/ko. 
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